有效的沟通需要适应与每个交流伙伴共享的特质共同基础。我们研究了这个问题的特别具有挑战性的实例化:流行的游戏dixit。我们将一轮dixit作为多代理图像参考游戏,在其中(训练有素的)扬声器模型描述了目标图像,以使一个(预审计的)侦听器模型可以从一组干扰器中正确识别它,但另一个听众无法识别它。为了适应这种设置,演讲者必须利用与不同听众共享的共同点的差异。我们表明,在这种对比性的多代理设置中,在剪辑视觉编码器和大型语言模型之间进行基于注意力的适配器会产生与上下文相关的自然语言专业化,而无需直接监督。在一系列受控的实验中,我们表明说话者可以根据各对不同听众的特质优势和劣势来适应。此外,我们显示了说话者专业化对看不见的现实世界数据的零拍传输。我们的实验为复杂的多方设置中的自适应沟通提供了一步,并突出了Dixit等游戏带来的有趣的研究挑战。我们希望我们的工作能够激发创造性的新方法,以适应预处理的模型。
translated by 谷歌翻译
在本文中,我们提出了一个动态的级联编码器自动语音识别(ASR)模型,该模型统一了不同部署方案的模型。此外,该模型可以显着降低模型尺寸和功耗而不会损失质量。也就是说,使用动态级联编码器模型,我们探索了三种技术,以最大程度地提高每个模型大小的性能:1)在共享编码器时为每个子模型使用单独的解码器;2)使用漏斗 - 提高编码器效率;3)平衡因果关系的大小,以提高质量和适合部署限制。总体而言,与基线级联编码器模型相比,拟议的大中等模型的尺寸较小30%,并将功耗降低了33%。统一大型,中和小型模型的三重大小模型可实现37%的总尺寸减少,而质量损失最小,同时大大减少了拥有单独模型的工程工作。
translated by 谷歌翻译
我们开发了第一个快速频谱算法,用于分解$ \ mathbb {r}^d $排名到$ o的随机三阶张量。我们的算法仅涉及简单的线性代数操作,并且可以在当前矩阵乘法时间下在时间$ o(d^{6.05})$中恢复所有组件。在这项工作之前,只能通过方形的总和[MA,Shi,Steurer 2016]实现可比的保证。相反,快速算法[Hopkins,Schramm,Shi,Steurer 2016]只能分解排名最多的张量(D^{4/3}/\ text {polylog}(d))$。我们的算法结果取决于两种关键成分。将三阶张量的清洁提升到六阶张量,可以用张量网络的语言表示。将张量网络仔细分解为一系列矩形矩阵乘法,这使我们能够快速实现该算法。
translated by 谷歌翻译
我们开发了一种高效的随机块模型中的弱恢复算法。该算法与随机块模型的Vanilla版本的最佳已知算法的统计保证匹配。从这个意义上讲,我们的结果表明,随机块模型没有稳健性。我们的工作受到最近的银行,Mohanty和Raghavendra(SODA 2021)的工作,为相应的区别问题提供了高效的算法。我们的算法及其分析显着脱离了以前的恢复。关键挑战是我们算法的特殊优化景观:种植的分区可能远非最佳意义,即完全不相关的解决方案可以实现相同的客观值。这种现象与PCA的BBP相转变的推出效应有关。据我们所知,我们的算法是第一个在非渐近设置中存在这种推出效果的鲁棒恢复。我们的算法是基于凸优化的框架的实例化(与平方和不同的不同),这对于其他鲁棒矩阵估计问题可能是有用的。我们的分析的副产物是一种通用技术,其提高了任意强大的弱恢复算法的成功(输入的随机性)从恒定(或缓慢消失)概率以指数高概率。
translated by 谷歌翻译
量子噪声是嘈杂中间级量子(NISQ)计算机中的关键挑战。以前的缓解噪声的工作主要集中在门级或脉冲级噪声自适应编译。然而,有限的研究工作通过使量子电路本身对噪声具有更高的优化级别。我们提出了Quoutumnas,是变分电路和量子位映射的噪声自适应共同搜索的全面框架。变形量子电路是构建QML和量子仿真的有希望的方法。然而,由于大型设计空间和参数训练成本,找到最佳变分电路及其最佳参数是具有挑战性的。我们建议通过引入新的超级速度来解耦电路搜索和参数培训。超电路由多层预定的参数化栅极构成,并通过迭代采样和更新其的参数子集(Subcircuit)训练。它提供了从头开始培训的子通差形性能的准确估计。然后我们执行Subcircuit的演进共同搜索和其量子位映射。使用从超级电路继承的参数和使用真实设备噪声模型进行估计,估计子电路性能。最后,我们执行迭代栅极修剪和FineTuning以去除冗余栅极。在10个量子计算上广泛评估了12个QML和VQE基准,Quoutumnas显着优于基线。对于QML,Quoutumnas是第一个展示超过95%的2级,85%的4级和真实QC的32%的10级分类准确性。与UCCSD相比,它还实现了H2,H2O,LIH,CH4,BEH2上的VQE任务的最低特征值。我们还开源Quantumengine(https://github.com/mit-han-lab/pytorch-quantum),用于快速训练参数化量子电路,以促进未来的研究。
translated by 谷歌翻译
城市地区消耗了世界上三分之二的能源,占全球二氧化碳排放量的70%以上。正如IPCC全球预热的1.5C报告所述,到2050年实现碳中型需要清楚地了解城市几何形状。卫星图像的高质量建筑占地面积可以加速这一预测过程和授权在规模上的授权市决策。然而,以前的深度学习的方法面临相应的问题,例如缩放不变性和缺陷的足迹,部分原因是由于持续存在的类别不平衡。此外,大多数方法都需要补充数据,例如点云数据,建筑物高度信息和多频段图像 - 这具有有限的可用性并且产生乏味。在本文中,我们提出了一种改进的Deeplabv3 +模块,其具有扩张的REN底座骨架,仅产生从三声道RGB卫星图像的建筑占地面积的掩模。此外,我们在客观函数中引入了F-Beta测量,以帮助模型账户进行偏斜类分布,并防止假阳性占地面积。除F-Beta之外,我们还纳入了指数加权的边界损失,并使用跨数据集培训策略来进一步提高预测的质量。因此,我们跨越三个公共基准实现最先进的表演,并证明我们的RGB方法产生更高质量的视觉结果,并且对卫星图像的规模,分辨率和城市密度不可知。
translated by 谷歌翻译
In this paper, we propose a novel technique, namely INVALIDATOR, to automatically assess the correctness of APR-generated patches via semantic and syntactic reasoning. INVALIDATOR reasons about program semantic via program invariants while it also captures program syntax via language semantic learned from large code corpus using the pre-trained language model. Given a buggy program and the developer-patched program, INVALIDATOR infers likely invariants on both programs. Then, INVALIDATOR determines that a APR-generated patch overfits if: (1) it violates correct specifications or (2) maintains errors behaviors of the original buggy program. In case our approach fails to determine an overfitting patch based on invariants, INVALIDATOR utilizes a trained model from labeled patches to assess patch correctness based on program syntax. The benefit of INVALIDATOR is three-fold. First, INVALIDATOR is able to leverage both semantic and syntactic reasoning to enhance its discriminant capability. Second, INVALIDATOR does not require new test cases to be generated but instead only relies on the current test suite and uses invariant inference to generalize the behaviors of a program. Third, INVALIDATOR is fully automated. We have conducted our experiments on a dataset of 885 patches generated on real-world programs in Defects4J. Experiment results show that INVALIDATOR correctly classified 79% overfitting patches, accounting for 23% more overfitting patches being detected by the best baseline. INVALIDATOR also substantially outperforms the best baselines by 14% and 19% in terms of Accuracy and F-Measure, respectively.
translated by 谷歌翻译
The recent increase in public and academic interest in preserving biodiversity has led to the growth of the field of conservation technology. This field involves designing and constructing tools that utilize technology to aid in the conservation of wildlife. In this article, we will use case studies to demonstrate the importance of designing conservation tools with human-wildlife interaction in mind and provide a framework for creating successful tools. These case studies include a range of complexities, from simple cat collars to machine learning and game theory methodologies. Our goal is to introduce and inform current and future researchers in the field of conservation technology and provide references for educating the next generation of conservation technologists. Conservation technology not only has the potential to benefit biodiversity but also has broader impacts on fields such as sustainability and environmental protection. By using innovative technologies to address conservation challenges, we can find more effective and efficient solutions to protect and preserve our planet's resources.
translated by 谷歌翻译
Different people speak with diverse personalized speaking styles. Although existing one-shot talking head methods have made significant progress in lip sync, natural facial expressions, and stable head motions, they still cannot generate diverse speaking styles in the final talking head videos. To tackle this problem, we propose a one-shot style-controllable talking face generation framework. In a nutshell, we aim to attain a speaking style from an arbitrary reference speaking video and then drive the one-shot portrait to speak with the reference speaking style and another piece of audio. Specifically, we first develop a style encoder to extract dynamic facial motion patterns of a style reference video and then encode them into a style code. Afterward, we introduce a style-controllable decoder to synthesize stylized facial animations from the speech content and style code. In order to integrate the reference speaking style into generated videos, we design a style-aware adaptive transformer, which enables the encoded style code to adjust the weights of the feed-forward layers accordingly. Thanks to the style-aware adaptation mechanism, the reference speaking style can be better embedded into synthesized videos during decoding. Extensive experiments demonstrate that our method is capable of generating talking head videos with diverse speaking styles from only one portrait image and an audio clip while achieving authentic visual effects. Project Page: https://github.com/FuxiVirtualHuman/styletalk.
translated by 谷歌翻译
Unmanned aerial vehicle (UAV) swarms are considered as a promising technique for next-generation communication networks due to their flexibility, mobility, low cost, and the ability to collaboratively and autonomously provide services. Distributed learning (DL) enables UAV swarms to intelligently provide communication services, multi-directional remote surveillance, and target tracking. In this survey, we first introduce several popular DL algorithms such as federated learning (FL), multi-agent Reinforcement Learning (MARL), distributed inference, and split learning, and present a comprehensive overview of their applications for UAV swarms, such as trajectory design, power control, wireless resource allocation, user assignment, perception, and satellite communications. Then, we present several state-of-the-art applications of UAV swarms in wireless communication systems, such us reconfigurable intelligent surface (RIS), virtual reality (VR), semantic communications, and discuss the problems and challenges that DL-enabled UAV swarms can solve in these applications. Finally, we describe open problems of using DL in UAV swarms and future research directions of DL enabled UAV swarms. In summary, this survey provides a comprehensive survey of various DL applications for UAV swarms in extensive scenarios.
translated by 谷歌翻译